1. Field of the Invention
This invention relates to email systems, and more particularly to the detection of content similarities within email documents.
2. Description of the Related Art
Frequently, it is desired to efficiently find similar emails located in a database. Often, emails may be near duplicates because an email is forwarded or replied to without much added text. However, searching through an extensive database and comparing emails to determine potentially similar emails can be a problematic process. One approach is to compute a hash value from the content of differing emails, and then comparing the hash values for equality. Unfortunately, such approaches would typically only identify emails that are exact duplicates, since any differences in the emails would typically result in the generation of different hash values. Another possible approach is to compare every word of an email against the words of another to determine similarity. However, such an approach is typically very computationally intensive.